HuBERT is a self-supervised speech representation learning model that provides aligned target labels for BERT-like prediction loss through offline clustering steps, suitable for speech recognition, generation, and compression tasks.
Speech Recognition
Transformers English